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NMR 


52.38 


19.21 


8.57 


4.36 


Uncertainty 


0.90 


1.03 


0.45 


0.22 


ff96 


52.12 


19.98 


8.61 


4.62 


ff96_MVN 


52.32 ± 0.20 


19.57 ± 0.23 


8.59 ± 0.12 


4.55 ± 0.04 


fF96_dirichlet 


52.29 ± 0.17 


19.64 ± 0.20 


8.60 ± 0.11 


4.56 ± 0.04 


ffQGjmaxent 


52.21 ± 0.14 


19.76 ± 0.18 


8.62 ± 0.11 


4.58 ± 0.04 


ff99 


52.82 


17.97 


8.32 


4.62 


ff99_MVN 


52.44 ± 0.35 


18.67 ± 0.59 


8.38 ± 0.23 


4.60 ± 0.07 


ff99_dirichlet 


52.49 ± 0.34 


18.71 ± 0.60 


8.41 ± 0.19 


4.61 ± 0.06 


ff99_maxent 


52.47 ± 0.25 


18.75 ± 0.56 


8.43 ± 0.12 


4.65 ± 0.06 


ff99sbnmr-ildn 


52.42 


18.34 


8.18 


4.54 


ff99sbnmr-ildn_MVN 


52.43 ± 0.05 


18.34 ± 0.12 


8.18 ± 0.03 


4.54 ± 0.01 


ff 99sbnmr-ildn dirichlet 


52.42 ± 0.05 


18.35 ± 0.12 


8.18 ± 0.03 


4.54 ± 0.01 


ff 99sbnmr-ildn _maxent 


52.43 ± 0.05 


18.35 ± 0.12 


8.18 ± 0.03 


4.54 ± 0.01 


charmm27 


52.52 


18.24 


8.25 


4.56 


charmm27_MVN 


52.53 ± 0.22 


18.37 ± 0.44 


8.24 ± 0.14 


4.53 ± 0.05 


charmm2 7_dirichlet 


52.52 ± 0.22 


18.45 ± 0.46 


8.25 ± 0.14 


4.54 ± 0.05 


charmm2 7_maxent 


52.43 ± 0.19 


18.56 ± 0.45 


8.25 ± 0.14 


4.55 ± 0.05 


oplsaa 


52.19 


19.64 


8.59 


4.60 


oplsaa_MVN 


52.32 ± 0.15 


19.43 ± 0.24 


8.59 ± 0.09 


4.56 ± 0.03 


oplsaa_dirichlet 


52.28 ± 0.14 


19.51 ± 0.19 


8.58 ± 0.09 


4.57 ± 0.03 


oplsaa_maxent 


52.22 ± 0.14 


19.62 ± 0.19 


8.57 ± 0.08 


4.58 ± 0.03 



Table SI: Predicted observables (chemical shifts) for each force field and BELT model. The 
row "Uncertainty" refers to the estimated prediction error for the physical models of chemical 
shifts. The uncertainties for each BELT prediction refer to the standard deviation of the 
MCMC trace of each observable. 
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ff96_D 
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ff96_M 
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ff99_D 


11.47 ± 0.26 


1.70 ± 0.19 
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fr99sbnmr-ildn_N 


11.47 ± 0.06 


1.83 ± 0.07 


2.31 ± 0.10 


1.00 ± 0.08 


6.02 ± 0.24 


8.45 ± 0.07 


ff99sbnmr-ildn_D 


11.49 ± 0.05 


1.83 ± 0.07 


2.30 ± 0.10 


1.00 ± 0.08 


6.03 ± 0.25 


8.48 ± 0.06 


ff99sbnmr-ildn_M 


11.48 ± 0.06 


1.82 ± 0.07 


2.31 ± 0.10 


1.00 ± 0.08 


6.01 ± 0.25 


8.47 ± 0.07 


charmm27 


11.23 


2.03 


1.79 


1.36 


6.34 


8.10 


charmm27_N 


11.30 ± 0.20 


1.83 ± 0.17 


2.32 ± 0.20 


1.15 ± 0.24 


5.70 ± 0.60 


8.24 ± 0.27 


charmm27_D 


11.34 ± 0.21 


1.81 ± 0.18 


2.33 ± 0.20 


1.14± 0.24 


5.69 ± 0.60 


8.30 ± 0.28 


cliariiiiii27_]\I 


11.65 ± 0.21 


1.78 ± 0.17 


2.32 ± 0.22 


1.15 ± 0.21 


5.68 ± 0.58 


8.57 ± 0.22 


oplsaa 


11.09 


2.16 


1.89 


0.88 


7.03 


8.12 


oplsaa_N 


11.11 ± 0.19 


2.04 ± 0.23 


2.23 ± 0.17 


0.92 ± 0.20 


6.29 ± 0.50 


8.15 ± 0.22 


oplsaa_D 


11.14 ± 0.18 


1.99 ± 0.16 


2.20 ± 0.17 


0.89 ± 0.19 


6.40 ± 0.46 


8.20 ± 0.20 


oplsaa_M 


11.28 ± 0.23 


1.95 ± 0.13 


2.22± 0.18 


0.84 ± 0.18 


6.45± 0.43 


8.37 ± 0.24 



Table S2: Predicted observables (scalar couplings) for each force field and BELT model. The 

row "Uncertainty" refers to the estimated prediction error for the physical models of scalar 
couplings. The uncertainties for each BELT prediction refer to the standard deviation of 
the MCMC trace of each observable. Note: to save space, we have abbreviated the priors 
Normal, Dirichlet, and maximum entropy as N, D, and M, respectively. 
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maxent dirichlet MVN 



ff96 


0.71 


0.68 


0.67 


ff99 


0.69 


0.63 


0.67 


ff99sbnmr-ildn 


0.68 


0.68 


0.68 


charmm27 


0.71 


0.62 


0.60 


oplsaa 


0.69 


0.64 


0.63 




maxent 
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ff96 


0.07 


0.08 


0.10 


ff99 


0.13 


0.12 


0.11 


ff99sbnmr-ildn 


0.04 


0.04 


0.04 


charmm27 


0.10 


0.11 


0.11 


oplsaa 


0.08 


0.08 


0.08 



Table S3: Predicted PPn populations (top) and uncertainties (bottom). 



4 



maxent dirichlet MVN 



ff96 


0.27 


0.26 


0.25 


ff99 
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0.19 


0.18 


ff99sbnmr-ildn 


0.24 
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0.24 


charmm27 
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0.22 
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0.07 


0.07 
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ff99 


0.09 


0.07 


0.07 


ff99sbnmr-ildn 


0.04 


0.03 


0.03 


charmm27 


0.08 


0.07 


0.07 


oplsaa 


0.06 


0.06 


0.05 



Table S4: Predicted (3 populations (top) and uncertainties (bottom). 
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maxent dirichlet MVN 



ff96 


0.02 


0.05 
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ff99 


0.07 


0.19 
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ff99sbnmr-ildn 


0.07 


0.07 
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0.08 
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0.12 




maxent 


dirichlet 


MVN 


ff96 


0.04 


0.07 


0.09 


ff99 


0.14 


0.13 


0.12 


ff99sbnmr-ildn 


0.03 


0.02 


0.03 


charmm27 


0.08 


0.10 


0.10 


oplsaa 


0.08 


0.07 


0.07 



Table S5: Predicted populations (top) and uncertainties (bottom). 
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maxent 


dirichlet 


MVN 


ff96 


0.01 


0.01 


0.01 


ff99 


0.00 


0.00 


0.00 


ff99sbnmr-ildn 


0.00 


0.00 


0.00 


charmm27 


0.02 


0.02 


0.03 


oplsaa 


0.01 


0.02 


0.04 




maxent 


dirichlet 


MVN 


ff96 


0.01 


0.02 


0.02 


ff99 


0.00 


0.00 


0.00 


ff99sbnmr-ildn 


0.00 


0.00 


0.00 


charmm27 


0.01 


0.02 


0.02 


oplsaa 


0.01 


0.03 


0.05 



Table S6: Predicted ai populations (top) and uncertainties (bottom). 
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all train test 



ff96 


2.55 


2.91 


1.99 


ff96_MVN 


0.58 


0.35 


0.93 


ff96_dirichlet 


0.55 


0.35 


0.86 


ffQGjnaxent 


0.58 


0.38 


0.88 


ff99 


10.80 


12.94 


7.59 


ff99_MVN 


0.98 


0.63 


1.51 


ff99 dirichlet 


0.95 


0.63 


1.43 


ff99jmaxent 


0.98 


0.58 


1.58 


ff99sbnmr-ildn 


0.39 


0.31 


0.50 


ff99sbnmr-ildn_MVN 


0.43 


0.35 


0.55 


fF99sbnmr-ildn_diriclilet 


0.44 


0.35 


0.57 


ff 99sbnmr-ildn _maxent 


0.42 


0.35 


0.54 


cliarmm27 


1.44 


1.65 


1.13 


cliarmm27_MVN 


0.74 


0.60 


0.95 


cliarmm27_diriclilet 


0.73 


0.58 


0.96 


charmm27_maxent 


0.73 


0.53 


1.04 


oplsaa 


2.26 


1.10 


4.01 


oplsaa_MVN 


0.94 


0.48 


1.64 


oplsaa_dirichlet 


1.00 


0.50 


1.76 


oplsaa jmaxent 


1.06 


0.52 


1.87 



Table S7: Reduced for MD simulations and BELT ensembles. The 'all', 'training', and 
'test' datasets have 10, 6, and 4 measurements, respectively. 
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Figure SI: Ramachandran plots (2D histograms) of MD simulations and BELT models. 
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Figure S2: MCMC traces of first component of a. 
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Cross validation (ff96) 



• • MVN 

• • dirichlet 

• • maxent 



Cross validation (ff99) 



• • MVN 

• • dirichlet 

• • maxent 



(a) 



Cross validation (ff99sbnmr-ildn) 



• • MVN 

• • dirichlet 

• • maxent 
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(b) 



(c) 

Cross validation (charmm27) 



• • MVN 

• • dirichlet 

• • maxent 



Cross validation (opisaa) 



• • MVN 

• • dirichlet 

• • maxent 



(d) 



(e) 



Figure S3: Cross validated reduced . Cross validation was performed using twenty-fold 
subsampled trajectory data to make calculations computationally tractable. 
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Appendix SI. Connecting BELT, Maximum Entropy, 
and Hyperensembles 

Bayesian Energy Landscape Tilting generalizes a recent maximum entropy formalism (1) to 
include statistical uncertainty. Here, we outline the previous results and detail the connec- 
tion between BELT and the previous maximum entropy formalism. We also show how a 
hyperensemble formahsm of Crooks (2) naturally leads to BELT-like models. 



Outline of Maximum Entropy Formalism 

The previous work (1) used Jaynes' maximum entropy arguments (3) to derive a new ap- 
proach to constraining simulations. Here we outline those arguments for the case of a single 
observable. 

The maximum entropy formalism relies on finding the least informative probability dis- 
tribution that is compatible with some known constraints. The information entropy is used 
as the metric for quantifying the information content of a distribution: 



S{p)^- j p{x) \og{p{x))dx 



The method of Chodera and Pitera uses three constraints on p{x). First, p{x) must be a 
well-defined probability distribution: 



p{x)dx — 1 

Second, p{x) must give the correct average potential energy: 

3 

U{x)p{x)dx = -NksT 

Finally, we assume that we have access to an ensemble-average measurement F and a 
function j{x) that predicts the observable as a function of atomic coordinates: 

f{x)p{x)dx = F 

This constrained optimization problem is solved via Lagrange multipliers, eventually 
leading to the following gradient condition: 

dS dgo dgi dg2 _ 

1^ ^0^:; ^1^:^ M-p^ — u 

op op op op 

Here, the functions go, gi, and g2 are the following constraint equations: 



9o = J dxp{x) - 1 
^1 = y dxU{x)p{x) - ^NKbT 
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92^ j dxf{x)p{x) - F 
The solution to this optimization problem is given by 



p{x) oc exp{—pU{x) + af{x)) 



Samphng the potential U {x) + af{x) provides the least biased ensemble that is consistent 
with the measurement F. To determine a, Chodera and Pitera minimize an objective func- 
tion A, which is related to the a dependent partition function Z{a) and the experimental 
measurements F^: 



They also extend their calculation to include multiple measurements, leading to the 
following objective function: 



To minimize A, we calculate the gradient and set it to zero: 




— —{fk)a + Fk 



The obvious solution, when feasible, is the choice of a such that {fi{x))a — Fi. 

To illustrate this approach, consider the case of a single observable fi{x) (and therefore a 
single parameter ai). Suppose the molecule of interest shows a bimodal observable with two 
equally populated states. If we let ai = 0, then the biasing potential is 0 everywhere and 
our reweighted ensemble simply returns the results of the MD simulation (Fig. S4b). If we 
let ai — —1, conformations with large values of fi{x) are upweighted, while conformations 
with lower values of fi{x) arc downweighted (Fig. S4a). Finally, if ai — 1, the ensemble 
shifts in the opposite direction (Fig. S4c). 

Connecting BELT and Maximum Entropy 

In BELT, we sample the following log posterior distribution: 



If we instead maximize the posterior probability, the problem becomes equivalent to 
setting the derivative of LP to zero. Assuming that the prior distribution is constant, the 
derivative is calculated to be the following: 



A = log Z{a) + aF 




A ^ log Z (a) + 
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Figure S4: (a, b, c): Raw (ai = 0) and reweighted (e.g. tilted) histograms of a one dimen- 
sional observable, (d, e, f): The same, but plotted as free energies (e.g. —kT\og{p)). 



dak cr, "Ofc 

As before, if we find a value of a such that {fi{x))a = -fi, we will maximize the posterior 
probability. Thus, under ideal conditions, we expect similar results using the maximum 
entropy approach and BELT. 



Connecting BELT and Hyperensembles 

It has been argued (2) that a non-equilibrium ensemble p ought to be characterized by the 
distance from equilibrium, as measured by the relative entropy. Here we derive the BELT 
model using this entropic prior as a starting point. In that work. Crooks defines a probability 
distribution on ensembles given by 

np) = ^exp(-AD(p||p°)) 

In the above expression, D[p\\p^) refers to the KL divergence of ensembles p and p^, 
while A is a parameter that characterizes the distance from equilibrium. Note that the KL 
divergence, D{p\\p^), can be expressed as a sum over the conformational states j = 1, ...,m: 
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i?(p||p°) = (log(^)), = ^p,log^ 

p j Pj 

We let po be a reference Boltzmann ensemble, which in our case will be simulation 
performed in a particular force field. Now, let us suppose that we have a single measurement 
F that can be calculated via the ensemble average {f{x))p. Suppose that the likelihood of 
some measurement is given by 

P{F\p)^N{{f)„a') 
By B ayes' Theorem, we have that: 

P(p|F) oc P(F|p)P(p) 
Letting C be a normalizing constant, we have: 

P(p|F) = Cexp(--i^((/) -F)^)exp(-AD(p||p°)) 

For convenience, let 0 denote the ensemble average, in the ensemble p, of the experimental 
observable of interest: 

Suppose that is the maximum entropy ensemble (in the sense of (1); see "Outhne 

of Maximum Entropy Formalism") such that (/)p* = 0. The set of {p*(0)} is a small subset 
of the set of all ensembles. We now wish to see what happens when we integrate out or 
marginalize over the larger class of ensembles. 

We now wish to express the arbitrary ensemble p as a perturbation from the maximum 
entropy ensemble We introduce a perturbation variable = pj — p*^ and change 

variables from {pj} to (0, {Aj}), where A is a correction to the maximum entropy ensemble 
p*{(p)- We express the posterior probability in the new variables 4> and A: 

P(0, A|P) = C|Ji(0, A)|exp(-^((/) -P)2)exp(-AD(0, A||p°)) 

In the above expression, Ji{(j), A) is the Jacobian of the coordinate transformation. With 
the above assumptions, the probability can be simplified: 

P(0,A|P) = C|Ji(0,A)|exp(-^(0-P)2)exp(-AD(0,A||pO)) 

For a given value of 0, p*{(f)) (e.g. A 0) maximizes the entropy. This suggests a 
quadratic approximation to the entropy: 

D(0, A||p°) = (log(4)) ^ D(0,O||p°) + lA^H{<j>)A 
p I 

Here, Hij{(f)) — q^q^ , evaluated at the point of the maximum entropy ensemble (A = 0). 
Inserting this expression in the probability gives: 
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P(0, A|F) = C|Ji(0, A)|exp(- — (0-F)2)exp(-AD(0,O||p°) - -A^i/(0)A) 



Prom the perspective of BELT, A is a nuisance parameter — ^we want an ensemble that 
is described by a more parsimonious representation. We therefore integrate over A (e.g. 
marginahze) to achieve a probabihty that depends entirely on 0. 

P{m = C J |Ji(0,A)|exp(-^(0-Ff)exp(-AZ^(0,O||p°)-^A^i/(0)A)rfA 



P(0|F) =Cexp(- — (0-F)2)exp(-AD(0,O||p°)) j | Ji(0, A)| exp(-A-A^i7(0)A)rfA 

For an analytically tractable calculation, Ji(0, A) must be independent of A, so that we 
can remove it from under the integral. To see that this is true, we first explicitly list all free 
parameters in the two representations. In the p representation, the free parameters are pi, 

Pm^i, Pm can then be calculated as pm = 1 — Yl^=i Pj- ^^e new representation, the 
free parameters are Ai, Ato_2, 0; the remaining terms can be calculated by noting that 
^™ Aj ~ 0 and ^™ Aj/j- — 0. The transformation between these coordinate systems is 
explicitly given by 

Ai ^ Pi - p\ 

Pm-2 — Pm-2 

m—1 m—l 

<P ^ Pifi + - po)f^ 

This transformation is linear in p^, so the Jacobian is constant. The integral therefore 
simplifies as: 

P(0|F) = Cexp(-^(</)-F)2)exp(-Ai^(0,O||p°))|Ji| j exp(-A^A^i/((/))A)dA 

The above expression requires integrating subject to the constraints —p*{(f)) < Aj < 
1 — p*j{(t))- We assume that the likelihood is peaked around A = 0, so that the likelihood 
vanishes as we move far from A = 0. We can therefore replace the constrained integral to 
one over all space: 

P(0|F) =Cexp(- — (0-F)2)exp(-AD(0,O||p°))|Ji| / exp(-A-A^i/(0)A)dAi...(iA^_2 
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P(0|F) =Cexp(--^(0-F)2)exp(-Ai^(0,O||p°))|Ji| J(27r)--2— ^ 



2(72^^ Nryy, .lyv y A"»-2 det(i/(0)) 

To connect to BELT, we now change variables from 0 to o; (the tilting parameter of a 
BELT model), which introduces another Jacobian J2(a;) 



m-2_ 



P(a|F) = Cexp(- — ((/)„-F)2)exp(-AD(a,0||p°))|Ji||J2(a)|y(27r) ;^„_2 det(i/(a)) 

To gain perspective in the comparison to BELT, we calculate the logarithm and drop all 
terms that are independent of a: 



logP{a\F) = -^((/), -F)2 - AD(a,0||p°) + log|J2(a)| -^logdet H{a) 

Assuming that our conformational samples were drawn from the Boltzmann ensemble p^, 
the first two terms are identical to BELT with the maxent prior. The remaining terms are 
corrections that account for the entropic cost of restricting our ensemble to be described by 
a, rather than the entire space of possible ensembles. The advantage of working with a is 
the dramatic reduction in the size of the parameter space. In the limit of A — )■ oo and a — )■ 0 
(and the approximations made above), the non-BELT terms vanish from the log likelihood. 

Another property to note is that the log likelihood is the only term dependent on F. 
This implies that we can collect all the terms independent of F and label them an effective 
prior: 

logP(a|F) = -^{{f)a - Ff + logPe//(«) 

Improved priors on a could possibly have the effect of correcting for the approximations 
(e.g. cr — > 0, A ^ cxo) used here in deriving BELT. In practice, however, we found that 
deriving corrections to BELT using this approach did not lead to improved performance, 
possibly because of the simphfications used in the present derivation. 



Appendix S2. Derivation of Reweighting 

Here we derive the population estimator used in BELT. As in the main text, we use sub- 
scripted angle brackets to indicate ensemble averages in reweighted ensembles: {h{x))a 
is the ensemble average of h{x) in an ensemble that is perturbed by a biasing potential 
A{x;a) = Y.iO^iMx)- 

{h{x))a = „l I h{x)dxexp[—U{x) — A(x)] 
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Z{a) denotes the partition function for the a ensemble. To proceed, we first note a simple 
Zwanzig identity that allows us to relate samples taken from different ensembles: 



{h{x))a = -^r\ [ h{x)dxex-p[-U{x) - A{x)] = ^^{h{x) exp[-A{x; a)])o 
Z[a) J Z{a) 

In the above expression, ()o denotes an unperturbed ensemble (e.g. a — 0) and Z{0) is the 
partition function of the unbiased ensemble (a = 0). Now we sample from the unperturbed 
ensemble to statistically estimate the expectation 

^ m 

{h{x) exp[— A(x; cy)])o — — cxp[— A(xj; a)]h{xj) 



m 



By letting h{x) = 1, we can estimate the partition function Z{a) up to the constant 
factor Z(0): 

Z(a) 1 r A / \1 

Combining these equations, we have 

{hix))a = ^h{xj)Tij{a) 

3 

where T^j{cx) give estimates of the conformation weights at a particular value of a: 

-kAo) = = — 7— TT exp[— A(x,; a)] 

Efcexp[-A(xfe;«)] ^ ^' 

Thus, BELT is essentially exponential averaging applied to a weighted combination of 
experiment-derived biasing potentials. However, the present work has introduced two key 
advances. First, the use of Markov chain Monte Carlo allows rigorous uncertainty analy- 
sis. Second, regularization reduces the high variance previously associated with exponential 
averaging. 
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Appendix S3. Alternative Error Models 



The model presented in the main text assumes independent normal deviations between mea- 
surements and the predicted ensemble. This model is a useful approximation that leads to 
a straightforward likelihood. However, in some situations, one might expect correlation 
between ensemble measurements. Detecting this correlation would require additional exper- 
imental measurements. However, it is possible to modify the likelihood to account for 
correlations between the predicted observables. The net result is a modified log likelihood: 

LLia) = ^z^p-^z 
^ ' 2 

where P is the correlation matrix of the observables: Pij = Cor{fi{x), fj{x)) and z is the 
deviation between the a ensemble and the measurement, measured in units of the known 
uncertainty af. Zi — ilil^p^. Using this model will likely lead to increased estimates of 
uncertainties. 

Other possible error models involve modifying the assumption of normality. A nor- 
mal model penalizes models by the squared deviation from the experimental measurements. 
However, expert knowledge may sometimes suggest different error models. For example, one 
could imagine a model where small deviations are not penalized at all. Such models could 
be inserted into the same MCMC framework with little extra effort. 
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Appendix S4. Choice of Prior 

Maximum Entropy (maxent) Prior 

As described in the main text, the maximum entropy prior is given by 

logP(a) = -A^7r,Hlog^ 

j 

Typically the reference populations are uniform; that is, tt^ = ^. This form of regular- 
ization has previously been used in a formalism for modeling SAXS ensembles (4). 

Dirichlet Prior 

We also consider the Dirichlet prior. Dirichlet priors are commonly used as conjugate priors 
to multinomial random variables — that is, when dealing with counts and probabilities of 
categorical data. The Dirichlet distribution is nonzero on the unit simplex and has the 
following functional form: 

i 

In the above equation, s is a vector of hyperparametcrs that are represent prior "pseudo- 
counts" on frames, while B{s) is a normalization constant containing a product of gamma 
functions: 



s(.)-n.r(.. 



The Dirichlet prior is an obvious choice for BELT, because the object of interest is 
the probability distribution on conformations. However, in BELT, we must restrict the 
distribution to the subset of probability distributions that can be achieved via reweighting. 
Thus, instead of ttj, we have 'Kjioi): 



B(s) 

For our MCMC calculations, we work with the log probability: 

log/(a;s) = -log(S(s)) + Y^.^i - l)log^i(«) 

3 

Note that the constant term is unimportant, as MCMC relies on the difference in log 
probabilities: 



log/(a;s) ^ ^(sj - l)log7r 
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In practice, the Dirichlet prior has a large number of hyperparameters — the pseudocounts 
Si on each conformation. To avoid the need for many hyperparameters, we assume that 




Thus, wc assume that the pseudocounts are proportional to the raw MD simulation 
populations, which for constant temperature MD should be a uniform distribution. For 
practical implementation in an MCMC sampler, we can drop terms that do not depend on 
a, which leads to the following: 



j 

Note that this can be rearranged into the following form, which better illuminates the 
connection between the maxent and Dirichlet priors: 



Notice that this functional form is quite similar to the maxent prior that we previously 
discussed. The difference between the maxent and Dirichlet priors can be explained in terms 
of the relative entropy between two distributions P and Q. The relative entropy is given by 



The relative entropy is not a symmetric relationship — that is, Dkl{P\\Q) Dki{Q\\P). 
The maxent and Dirichlet priors are simply the relative entropy between 7r(a) and a reference 
distribution tt", calculated in either direction. For equilibrium molecular dynamics, the 
reference distribution is simply uniform (^). 

Multivariate Normal (MVN) Prior 

In the MVN prior, a ~ A^(/i.S). We let ^ = 0 to center the MVN around a = 0. This 
places the highest prior density on the raw simulation and allows regularization of a. To 
pick E, we note that the simple choice of Sj^ = 6ij leads to a prior that depends on the 
units of a; this dependence on the unit system is undesirable. However, if we choose Sj^ = 
\Cov{fi{x), fj{x)), the units of and fi{x) cancel out in the MVN likelihood, leaving a 
result that is unit-invariant. We have also introduced a scaling factor A to tune the amount 
of regularization. 

Jeffrey's Prior 

Another choice of prior would be to use the Jeffrey's prior, which is uninformative and 
invariant under reparameterization. We found Jeffrey's prior to be less desirable, however, 
because it does not necessarily place the prior maximum at a = 0 — thus, Jeffrey's prior was 
unable to provide regularization. 





/^xL(p||g) = J]p«iog 



Qi 
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Appendix S5. Determining Prior Strength Via Cross- 
Validation 

Each prior in this work contains a single free parameter, A, which controls the level of 
regularization. At least two different approaches can help select an appropriate value of A: 

1. Cross validation on simulation data (used in main text) 

2. Cross validation on experimental data 

Cross validation on simulation data 

We first discuss cross-validating on the simulation data. The underlying idea is that too 
little regularization (A = 0) leads to models that overfit the available simulation data and 
generalize poorly — that is, repeating or extending the MD simulations would lead to a differ- 
ent result. At the other extreme, underfit models (A = cxo) will simply report the unbiased 
simulations, leading to poor agreement with experiment. To perform this form of cross- 
validation, first separate the simulation data into several independent subsets. Mark one 
subset as the "test" set and fit the model on the remaining data (the "training" set). The 
score is evaluated on the test data. We then repeat the process, letting the test set be equal 
to each of the other subsets. The final square is averaged over each of these iterations. 
The value of A is chosen to minimize the test set error. 

When using MD to generate conformations, one must perform cross-validation using un- 
correlated subsets of the data. This precludes the typical standard cross-validation approach 
that uses randomly selected subsets of your data — randomly selected folds will be tainted 
by correlation between the folds. As a thought experiment, suppose one cross validates 
by dividing your trajectory into even and odd frames. Because of time-correlation in the 
data, the even and odd subsets will essentially contain the same information — ruining the 
cross-validation. To avoid these perilous correlations, we recommend that you split the tra- 
jectory into time-contiguous blocks. For the present work, we divided each trajectory into 
two halves. 

Cross validation on experimental data 

Cross validating on experimental data instead sets aside experimental measurements that 
can then be used to evaluate model quality. One key difficulty with this approach, however, 
is that experimental datasets are often sparse — that is, there are often only few information- 
rich measurements. This can lead to difficulties defining meaningful training and test sets. 

Cross Validation Results 

Here we summarize the values of A used in this work. These values were determined by 
cross-validating on the simulation data. 
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A 


prior 


MVN 


dirichlet 


maxent 


forcefield 








ff96 


6.0 


7.0 


10 


ff99 


1 


1.25 


4 


ff99sbnmr-ildn 


100 


100 


100 


charmm27 


4 


4 


6 


oplsaa 


12.0 


13.0 


15 



The corresponding cross-validated reduced scores are given below. These scores were 
generated using the training set of experimental measurements, but done in the setting of 
cross-validation on the simulation data. Thus, the models were fit to half the trajectory data 
and evaluated on the other half. As before, we see similar performance with all priors. Full 
sweeps of A are depicted in Fig. S3. To some extent, we expect similar performance between 
the priors. This is because the relative entropy of normal distributions reduces to a weighted 
Euclidean distance between the means (5). However, the observables in the present work are 
non-normal, so the priors are not expected to give identical results. 

For the amber99sbnmr-ildn results, cross validation recommends the use of large amounts 
of regularization. This imphes two things. First, this forcefield is already in excellent agree- 
ment with experiment, so almost no reweighting is desired. This may also indicate limitations 
in our estimates of the uncertainties in the chemical shifts and scalar couplings. As a prac- 
tical note, when large amounts of regularization are used, the resulting MCMC traces will 
contain essentially no variance. It is thus necessary to use Bayesian Bootstrapping to get 
meaningful error bars. For cases with less regularization, Bayesian Bootstrapping is less 
critical because the MCMC traces account for the majority of the Tinccrtainty. 

For fT99 with the maxent prior, we found that calculations with very low amounts of 
regularization suffer from occasional numerical instabilities. Essentially, it appears that the 
regularization does not sufficiently penalize cij ±oo. 



(cross- validated) 



prior 


MVN 


dirichlet 


maxent 


forcefield 








ff96 


0.39 


0.37 


0.39 


ff99 


0.71 


0.70 


0.67 


ff99sbnmr-ildn 


0.35 


0.35 


0.34 


charmm27 


0.62 


0.59 


0.54 


oplsaa 


0.53 


0.57 


0.55 



31 



Appendix S6. Bayesian Bootstrapping 



The BELT model presented in the main text does not directly model simulation uncertainty. 
This effect, however, can be introduced using a resampling technique known as Bayesian 
bootstrapping (6). In Bayesian bootstrapping, every data point (e.g. conformation) is 
associated with a Dirichlet random variable that models the effect of resampling the given 
data points. In effect, each conformation is given a "prior" population that is allowed to 
fluctuate around its average value of -. 

One additional complication arises when using molecular dynamics simulations, which 
produce a correlated time series. Because of this, it is not sufficient to simply use a Dirichlet 
whose dimension is the same as the number of snapshots — such a procedure will significantly 
underestimate uncertainties due to correlation between frames. Instead, one must first divide 
the trajectory into independent blocks. The Dirichlet random variable is then chosen to 
sample the relative weights of each of the independent blocks. Choosing the length of each 
block can be done by applying Bayesian bootstrapping to the un-reweighted trajectory. 
Given some observable of interest, O, one calculates 0{B) for a sequence of block lengths, 
choosing the value of B that maximizes the estimated uncertainty of O. The block length 
could also be calculated using other blocking methods (7) or by statistical inefficiency analysis 



In practice, applying Bayesian bootstrapping involves repeating several BELT calcula- 
tions using different values of "prior" conformational populations that were drawn from a 
Dirichlet random variable. The MCMC traces of each run are then pooled. 
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Appendix S7. Convergence Analysis 



Although more sophisticated convergence tests are available, we evaluated convergence of 
MCMC traces by visual analysis. A properly sampled and thinned model will appear similar 
to white noise, as observed in Fig. S2. A few interesting features are worth noting. The 
charmm27 and ff99 forcefields with MVN prior seem to suffer from increased correlation in 
their MCMC traces. 

Based on this and our other experience, we offer some suggestions for achieving converged 
traces. First, it seems that the maxent and Dirichlet priors are better able to achieve 
independent MCMC samples than the MVN prior. Second, poorer force fields (e.g. ff99, 
charmm27, and oplsaa) seem more prone to correlated MCMC samples. This is likely because 
the sampler is forced to explore "extreme" models — that is, models that lie further from the 
raw forcefield. Finally, we find that adding additional measurements — particularly ones that 
are correlated to previous measurements — leads to increased correlation within the MCMC 
traces. We think these observations should help guide users towards achieving convergence 
without excessive computational resources. 
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